Linguistic Tests for Discourse Relations in the TüBa-D/Z Corpus of Written German

نویسندگان

  • Yannick Versley
  • Anna Gastel
چکیده

Discourse structure and discourse relations are an important ingredient in systems for the analysis of text that go beyond the boundary of single clauses. Discourse relations often indicate important additional information about the connection between two clauses, such as causality, and are widely believed to have an influence on aspects of reference resolution. More so than for referential annotation, discourse relation annotation is rendered difficult by the absence of a general consensus on the underlying linguistic phenomena that should be targeted, as well as by a lack of strong predictions on the possible or permissible interactions between these phenomena. While it is sometimes claimed that the structuring of discourse is only weakly constrained and as a result capturing discourse structure and discourse relations will always result in poor reproducibility of the annotation task, we want to argue in this paper that an explicit notion of the relata of discourse relations allows to delimit annotation scope and to make use of theoretical accounts of the linguistic phenomena involved without giving up the goal of theory-neutrality that is essential in making sure that a given resource stays useful to a large community of users. In this article, we first present the general design choices that are to be made in the design of an annotation scheme for discourse structure and discourse relations. In a second part, we present the scheme used in our annotation of selected articles from the TüBa-D/Z treebank of German (Telljohann et al., 2009). The scheme used in the annotation is theory-neutral, but informed by more detailed linguistic knowledge in the way of linguistic tests that can help disambiguate between several plausible relations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clausal Coordinate Ellipsis and its Varieties in Spoken German: A Study with the TüBa-D/S Treebank of the VERBMOBIL Corpus

Grammar rules for Clausal Coordinate Ellipsis (CCE) are based nearly exclusively on linguistic judgments (intuitions). For German, the extent to which grammar rules based on this type of empirical evidence generate all and only CCE structures that populate text corpora, has only been explored with the TIGER treebank of written newspaper text. How well these rules fit spoken German is unknown. I...

متن کامل

Recent Developments in Linguistic Annotations of the TüBa-D/Z Treebank*

The data is taken from daily issues of the German newspaper 'die Tageszeitung' (taz) currently ranging from May 3 to May 7 1999 as well as April 3

متن کامل

Annotating Discourse Anaphora

In this paper, we present preliminary work on corpus-based anaphora resolution of discourse deixis in German. Our annotation guidelines provide linguistic tests for locating the antecedent, and for determining the semantic types of both the antecedent and the anaphor. The corpus consists of selected speaker turns from the Europarl corpus.

متن کامل

Subgraph-based Classification of Explicit and Implicit Discourse Relations

Current approaches to recognizing discourse relations rely on a combination of shallow, surfacebased features (e.g., bigrams, word pairs), and rather specialized hand-crafted features. As a way to avoid both the shallowness of word-based representations and the lack of coverage of specialized linguistic features, we use a graph-based representation of discourse segments, which allows for a more...

متن کامل

Genre Analysis of ELT and Nursing Academic Written Discourse through Introduction

Since Swales’ (1981, 1990) CARS model work on the move structure of research articles, studies on genre analysis have been carried out amongst which works on different parts of research articles in various disciplines has gained a considerable literature. This study aims to investigate the rhetorical structure of the Introduction sections of articles in two fields of English Language Teaching (...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012